Feature Selection Forcing Overtraining May Help to Improve Performance
نویسندگان
چکیده
One of the main drawbacks of Machine Learning systems is the negative effect caused by overtraining. If the points in the dataset are perfectly fitted, the generalization performance is usually bad. We propose to take profit of overtraining, together with Feature Selection, to improve the performance of a learning system. The main idea lies in the hypothesis that when the dataset is as fitted as possible, the system is forced to use all the available variables as much as possible. Noisy and useless variables can be detected if generalization improves when the system is not allowed to use them. Forcing overtraining, noisy and useless variables should be more outstanding. In order to test this hypothesis, we performed several Feature Selection experiments using Feed-forward Neural Networks. The particular Feature Selection procedure used was Sequential Backward Selection. Experimental results with several real-world problems suggest that our hypothesis seems to be well-founded. Ironically, forcing overtraining may help to achieve good performance.
منابع مشابه
Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملTrainability of young athletes and overtraining.
Exercise adaptations to strength, anaerobic and aerobic training have been extensively studied in adults, however, young people appear to respond differently to such exercise stimulus in comparison to adults. In addition, because overtraining in young athletes has received little attention, this important area is also discussed. Resistance training in children can be safe and effective. It has ...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کامل